Non-Monotonic Sentence Alignment via Semisupervised Learning

نویسندگان

Xiaojun Quan

Chunyu Kit

Yan Song

چکیده

This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while existing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Block ITG Models for Word Alignment

Labeled training data for the word alignment task, in the form of word-aligned sentence pairs, is hard to come by for many language-pairs. Hence, it is natural to draw upon semi-supervised learning methods (Fraser and Marcu, 2006). We introduce a semisupervised learning method for word alignment using conditional entropy regularization (Grandvalet and Bengio, 2005) on top of a BITG-based discri...

متن کامل

Boosting Statistical Word Alignment Using Labeled and Unlabeled Data

This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. T...

متن کامل

Learning speaker normalization using semisupervised manifold alignment

As a child acquires language, he or she: perceives acoustic information in his or her surrounding environment; identifies portions of the ambient acoustic information as languagerelated; and associates that language-related information with his or her perception of his or her own language-related acoustic productions. The present work models the third task. We use a semisupervised alignment alg...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Non-Monotonic Sentence Alignment via Semisupervised Learning

نویسندگان

چکیده

منابع مشابه

Semi-Supervised Block ITG Models for Word Alignment

Boosting Statistical Word Alignment Using Labeled and Unlabeled Data

Learning speaker normalization using semisupervised manifold alignment

Image Classification via Sparse Representation and Subspace Alignment

Image alignment via kernelized feature learning

عنوان ژورنال:

اشتراک گذاری